Audiovisual Attention Modeling and Salient Event Detection
نویسندگان
چکیده
Although human perception appears to be automatic and unconscious, complex sensory mechanisms exist that form the preattentive component of understanding and lead to awareness. Considerable research has been carried out into these preattentive mechanisms and computational models have been developed for similar problems in the fields of computer vision and speech analysis. The focus here is to explore aural and visual information in video streams for modeling attention and detecting salient events. The separate aural and visual modules may convey explicit, complementary or mutually exclusive information around the detected audiovisual events. Based on recent studies on perceptual and computational attention modeling, we formulate measures of attention using features of saliency for the audiovisual stream. Audio saliency is captured by signal modulations and related multifrequency band features, extracted through nonlinear operators and energy tracking. Visual saliency is measured by means of a spatiotemporal attention model driven by various feature cues (intensity, color, motion). Features from both modules mapped to one-dimensional, time-varying saliency curves, from which statistics of salient segments can be extracted and important audio or visual events can be detected through adaptive, threshold-based mechanisms. Audio and video curves are integrated in a single attention curve, where events may be enhanced, suppressed or vanished. Salient events from the audiovisual curve are detected through geometrical features such as local extrema, sharp transitions and level sets. The potential of inter-module fusion and audiovisual event detection is demonstrated in applications such as video key-frame selection, video skimming and video annotation.
منابع مشابه
The capacity of audiovisual integration is limited to one item.
The human visual attention system is geared toward detecting the most salient and relevant events in an overwhelming stream of information. There has been great interest in measuring how many visual events can be processed at a time, and most of the work has suggested that the limit is three to four. However, attention to a visual stimulus can also be driven by a synchronous auditory event. The...
متن کاملVisual Attention Based Salient Object Motion Detection in Spatio Temporal Volume
260 Abstract—We present different visual attention based salient object detection methods for effectively detecting object in structured environment. Human brains pay more attention towards some important part of image sequences. Those attentions are extraordinary fast and realistic one. Computation of such salient object detection like intelligence behavior is very difficult task for implement...
متن کاملExploiting Temporal Sequence Structure for Semantic Analysis of Multimedia
Automatic deduction of semantic labels for audiovisual data requires awareness of context, which in turn requires processing sequences of audiovisual scenes or events. The representation of such sequences is important for semantic analysis tasks. Whereas, conventionally, sequences of specific short-duration event labels, often hand-annotated for learning detectors or classifiers, have been used...
متن کاملEnhanced attention to speaking faces versus other event types emerges gradually across infancy.
The development of attention to dynamic faces versus objects providing synchronous audiovisual versus silent visual stimulation was assessed in a large sample of infants. Maintaining attention to the faces and voices of people speaking is critical for perceptual, cognitive, social, and language development. However, no studies have systematically assessed when, if, or how attention to speaking ...
متن کاملSalient regions detection in satellite images using the combination of MSER local features detector and saliency models
Nowadays, due to quality development of satellite images, automatic target detection on these images has been attracted many researchers' attention. Remote-sensing images follow various geospatial targets; these targets are generally man-made and have a distinctive structure from their surrounding areas. Different methods have been developed for automatic target detection. In most of these met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008